235 research outputs found

    A Comparison of Hybrid and End-to-End Models for Syllable Recognition

    Full text link
    This paper presents a comparison of a traditional hybrid speech recognition system (kaldi using WFST and TDNN with lattice-free MMI) and a lexicon-free end-to-end (TensorFlow implementation of multi-layer LSTM with CTC training) models for German syllable recognition on the Verbmobil corpus. The results show that explicitly modeling prior knowledge is still valuable in building recognition systems. With a strong language model (LM) based on syllables, the structured approach significantly outperforms the end-to-end model. The best word error rate (WER) regarding syllables was achieved using kaldi with a 4-gram LM, modeling all syllables observed in the training set. It achieved 10.0% WER w.r.t. the syllables, compared to the end-to-end approach where the best WER was 27.53%. The work presented here has implications for building future recognition systems that operate independent of a large vocabulary, as typically used in a tasks such as recognition of syllabic or agglutinative languages, out-of-vocabulary techniques, keyword search indexing and medical speech processing.Comment: 22th International Conference of Text, Speech and Dialogue TSD201

    Integration of prosodic and grammatical information in the analysis of dialogs

    Full text link

    Generating Automated News to Explain the Meaning of Sensor Data

    Full text link
    An important competence of human data analysts is to interpret and explain the meaning of the results of data analysis to end-users. However, existing automatic solutions for intelligent data analysis provide limited help to interpret and communicate information to non-expert users. In this paper we present a general approach to generating explanatory descriptions about the meaning of quantitative sensor data. We propose a type of web application: a virtual newspaper with automatically generated news stories that describe the meaning of sensor data. This solution integrates a variety of techniques from intelligent data analysis into a web-based multimedia presentation system. We validated our approach in a real world problem and demonstrate its generality using data sets from several domains. Our experience shows that this solution can facilitate the use of sensor data by general users and, therefore, can increase the utility of sensor network infrastructures

    Generating multimedia presentations: from plain text to screenplay

    Get PDF
    In many Natural Language Generation (NLG) applications, the output is limited to plain text – i.e., a string of words with punctuation and paragraph breaks, but no indications for layout, or pictures, or dialogue. In several projects, we have begun to explore NLG applications in which these extra media are brought into play. This paper gives an informal account of what we have learned. For coherence, we focus on the domain of patient information leaflets, and follow an example in which the same content is expressed first in plain text, then in formatted text, then in text with pictures, and finally in a dialogue script that can be performed by two animated agents. We show how the same meaning can be mapped to realisation patterns in different media, and how the expanded options for expressing meaning are related to the perceived style and tone of the presentation. Throughout, we stress that the extra media are not simple added to plain text, but integrated with it: thus the use of formatting, or pictures, or dialogue, may require radical rewording of the text itself

    Intelligent libraries and apomediators: distinguishing between Library 3.0 and Library 2.0.

    Get PDF
    Many terms and concepts have appeared in and disappeared from the history of librarianship. Currently, the use of “point oh” naming system to label developments in librarianship is prevalent. Debate on the appropriateness, basis and syntax of this naming system is ongoing. Specifically, the profession has been lately engrossed in discourses in various contexts to unravel the real meaning and potential of Library 2.0. But even before this debate is settled, a new term, Library 3.0, is seeking space in the core librarianship lexicon. This development is causing confusion among librarianship scholars, practitioners and students especially on whether there is any significant difference between the two models. Through documentary analysis, the authors explored the true meanings of these terms and have concluded that Library 2.0 and Library 3.0 are indeed different. The authors have also concluded that whereas Library 2.0 could be seen as attempting to weaken the role of librarians in the emerging information environment, Library 3.0 projects librarians as prominent apomediaries standing by and guiding the library users on how best to locate, access and use credible information in myriad formats from diverse sources, at the point of need. The authors therefore note that the prospect of the Library 3.0 model has revived hope amongst the librarians who were uncomfortable with the crowd intelligence architecture on which the Library 2.0 model was founded. Similarly, the authors have concluded that Library 3.0 provides the tools and framework to organize the infosphere that the Library 2.0 threw into disarray. Thus Library 3.0 is generally understood to be an improvement of Library 2.0 tools and techniques. The authors propose that a 3.0 library be perceived as a personalizable, intelligent, sensitive and living institution created and sustained by a seamless engagement of library users, librarians and subject experts on a federated network of information pathways

    Leolani: a reference machine with a theory of mind for social communication

    Full text link
    Our state of mind is based on experiences and what other people tell us. This may result in conflicting information, uncertainty, and alternative facts. We present a robot that models relativity of knowledge and perception within social interaction following principles of the theory of mind. We utilized vision and speech capabilities on a Pepper robot to build an interaction model that stores the interpretations of perceptions and conversations in combination with provenance on its sources. The robot learns directly from what people tell it, possibly in relation to its perception. We demonstrate how the robot's communication is driven by hunger to acquire more knowledge from and on people and objects, to resolve uncertainties and conflicts, and to share awareness of the per- ceived environment. Likewise, the robot can make reference to the world and its knowledge about the world and the encounters with people that yielded this knowledge.Comment: Invited keynote at 21st International Conference on Text, Speech and Dialogue, https://www.tsdconference.org/tsd2018

    Familial thrombocytopenia due to a complex structural variant resulting in a WAC-ANKRD26 fusion transcript

    Get PDF
    Advances in genome sequencing have resulted in the identification of the causes for numerous rare diseases. However, many cases remain unsolved with standard molecular analyses. We describe a family presenting with a phenotype resembling inherited thrombocytopenia 2 (THC2). THC2 is generally caused by single nucleotide variants that prevent silencing of ANKRD26 expression during hematopoietic differentiation. Short-read whole-exome and genome sequencing approaches were unable to identify a causal variant in this family. Using long-read whole-genome sequencing, a large complex structural variant involving a paired-duplication inversion was identified. Through functional studies, we show that this structural variant results in a pathogenic gain-of-function WAC-ANKRD26 fusion transcript. Our findings illustrate how complex structural variants that may be missed by conventional genome sequencing approaches can cause human disease
    • …
    corecore